1 |
The Role of human reference translation in machine translation evaluation
|
|
|
|
In: TDX (Tesis Doctorals en Xarxa) (2017)
|
|
BASE
|
|
Show details
|
|
2 |
Phrase table expansion for statistical machine translation with reduced parallel corpora: the Chinese-Spanish case
|
|
|
|
In: TDX (Tesis Doctorals en Xarxa) (2017)
|
|
BASE
|
|
Show details
|
|
3 |
Pattern-based automatic induction of domain adapted resources for social media analysis
|
|
|
|
In: TDX (Tesis Doctorals en Xarxa) (2016)
|
|
BASE
|
|
Show details
|
|
4 |
Automatic acquisition of lexical-semantic relations: gathering information in a dense representation
|
|
|
|
In: TDX (Tesis Doctorals en Xarxa) (2016)
|
|
BASE
|
|
Show details
|
|
5 |
The Structure of the lexicon in the task of the automatic acquisition of lexical information
|
|
|
|
In: TDX (Tesis Doctorals en Xarxa) (2015)
|
|
BASE
|
|
Show details
|
|
6 |
Verb SCF extraction for Spanish with dependency parsing ; Extracción de patrones de subcategorización de verbos en castellano con análisis de dependencias
|
|
|
|
BASE
|
|
Show details
|
|
7 |
Annotation of regular polysemy: an empirical assessment of the underspecified sense
|
|
|
|
In: TDX (Tesis Doctorals en Xarxa) (2013)
|
|
BASE
|
|
Show details
|
|
8 |
METANET4U: enhancing the European linguistic infrastructure ; METANET4U: aumentar la infraestructura lingüística europea
|
|
|
|
BASE
|
|
Show details
|
|
9 |
Los Nombres eventivos no deverbales en español
|
|
|
|
In: TDX (Tesis Doctorals en Xarxa) (2011)
|
|
BASE
|
|
Show details
|
|
10 |
FLaReNet: una red para fomentar los recursos lingüísticos ; Fostering language resources network: FLaReNet
|
|
|
|
BASE
|
|
Show details
|
|
11 |
El Proyecto CLARIN: Una infraestructura de investigación científica para las Humanidades y las Ciencias Sociales
|
|
|
|
BASE
|
|
Show details
|
|
15 |
Choosing which to use? A study of distributional models for nominal lexical semantic classification
|
|
|
|
BASE
|
|
Show details
|
|
16 |
The Spanish resource grammar: pre-processing strategy and lexical acquisition
|
|
|
|
BASE
|
|
Show details
|
|
17 |
Towards the automatic classification of complex-type nominals
|
|
|
|
BASE
|
|
Show details
|
|
20 |
Mining and exploiting domain-specific corpora in the PANACEA platform
|
|
|
|
Abstract:
The objective of the PANACEA ICT-2007.2.2 EU project is to build a platform that automates the stages involved in the acquisition,production, updating and maintenance of the large language resources required by, among others, MT systems. The development of a Corpus Acquisition Component (CAC) for extracting monolingual and bilingual data from the web is one of the most innovative building blocks of PANACEA. The CAC, which is the first stage in the PANACEA pipeline for building Language Resources, adopts an efficient and distributed methodology to crawl for web documents with rich textual content in specific languages and predefined domains. The CAC includes modules that can acquire parallel data from sites with in-domain content available in more than one language. In order to extrinsically evaluate the CAC methodology, we have conducted several experiments that used crawled parallel corpora for the identification and extraction of parallel sentences using sentence alignment. The corpora were then successfully used for domain adaptation of Machine Translation Systems.
|
|
Keyword:
Boilerplate removal; Corpus acquisition; IPR for language resources; Web crawling
|
|
URL: http://hdl.handle.net/10230/20416
|
|
BASE
|
|
Hide details
|
|
|
|